Dual-Process Agentic UQ Framework

Updated 8 March 2026

Dual-Process AUQ is a framework that actively quantifies and controls uncertainty in intelligent agents by integrating fast heuristic processes with slow reflective deliberation.
It decomposes uncertainty into intrinsic and extrinsic components, enabling reliable long-horizon decision-making and mitigating the spiral of error propagation.
Practical implementations span LLM agents and multimodal systems, achieving significant performance gains and robust calibration in diverse application domains.

Dual-Process Agentic Uncertainty Quantification (AUQ) is a rigorous framework for transforming uncertainty assessment in intelligent agents from passive monitoring to active, bi-directional control. By decomposing agentic reasoning into coupled inference and control processes, AUQ enables more reliable long-horizon decision-making, calibrates both intrinsic and inherited uncertainty, and situates uncertainty as a first-class driver of agentic policy and exploration. Instantiations span LLM agents in sequential and multimodal reasoning, social-psychological dual-process models, and conformalized tool calibration in vision–language systems, providing a unifying perspective on robust agent design.

1. Motivations and Foundations

Traditional uncertainty quantification (UQ) in AI agents, especially those powered by LLMs, primarily focuses on single-turn predictions, using metrics such as predictive entropy or token-level variance to detect low-confidence outputs. However, in sequential, agentic contexts—where decisions propagate over multiple steps—these approaches are fundamentally insufficient. AUQ frameworks address the compounding and propagation of epistemic risk, often termed the “Spiral of Hallucination,” wherein small early errors propagate irreversibly through the trajectory, substantially degrading reliability (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025).

The dual-process perspective, rooted in both cognitive psychology and computational reinforcement learning, dichotomizes agent cognition into two tightly coupled subsystems: a fast, heuristic stream (often memory- or affect-driven) and a slow, analytic control mechanism. This separation underpins a spectrum of modern agentic UQ algorithms for multimodal, language, and social reasoning tasks (Zhang et al., 22 Jan 2026, Zhi et al., 11 Mar 2025, Hoey et al., 2019).

2. Dual-Process AUQ Architectures

2.1 System 1 and System 2 in AUQ

The System 1 (“fast path”) provides uncertainty-aware implicit control, typically by propagating verbalized confidence or heuristic cues through memory and attention. In agentic LLM contexts, this is instantiated as Uncertainty-Aware Memory (UAM): each output is augmented with a scalar confidence $\hat{c}_t$ and a natural-language explanation $\hat{e}_t$ ; both are persistently retained in the agent’s memory and context window, biasing subsequent inference away from overcommitment (Zhang et al., 22 Jan 2026).

System 2 (“slow path”) is invoked when confidence falls below a threshold, triggering Uncertainty-Aware Reflection (UAR): targeted, high-cost deliberation (e.g., best-of-N sampling or reflective planning) guided by prior uncertainty explanations $\hat{e}_t$ . System 2 selects the final decision via a consistency-weighted aggregation across reflective candidates, only incurring computational cost when confidence deficits arise (Zhang et al., 22 Jan 2026).

2.2 Dual-Process in Information-Theoretic AUQ

In sequence modeling, notably with LLMs, dual-process AUQ divides total predictive uncertainty at each step into an intrinsic component (local entropy given past actions) and an extrinsic component (mutual information with previous decisions). This decomposition allows agents to explicitly track how much risk is “inherited” along the trajectory and pre-allocate attention/resources (Duan et al., 20 Jun 2025).

In vision–LLMs, agentic decision loops (dynamic region-of-interest selection) are paired with conformal prediction (CP)-calibrated tool outputs. The agent only attends to regions deemed relevant by stepwise reasoning (fast path), while CP-based calibration (slow path) ensures strict coverage guarantees regardless of tool miscalibration (Zhi et al., 11 Mar 2025). In social-psychological agents (BayesAct), the streams correspond to affective (connotative) and decision-theoretic (denotative) processes, with somatic coherence ensuring coherence and adaptive policy weighting (Hoey et al., 2019).

3. Mathematical Formalism and Uncertainty Metrics

3.1 Propagated Uncertainty in Sequential Decision-Making

Given a trajectory $\mathcal{P} = (Y_1, ..., Y_T) \sim p(\mathcal{P} \mid x)$ :

$H(Y_t \mid x) = H(Y_t \mid Y_{1:t-1}, x) + \sum_{i=1}^{t-1} I(Y_t; Y_i \mid Y_{i+1:t-1}, x)$

Intrinsic Uncertainty (IU): $H(Y_t \mid Y_{1:t-1}, x)$
Extrinsic Uncertainty (EU): $\sum_{i=1}^{t-1} I(Y_t; Y_i \mid \ldots)$

The UProp estimator trades direct, intractable marginalization for trajectory-wise Pointwise Mutual Information (PMI) approximations via Monte Carlo sampling. For a sampled trajectory, IU is estimated with predictive entropy, and EU is approximated using kernel-smoothed PMI scores over per-step samples (Duan et al., 20 Jun 2025).

3.2 Calibration and Selection Metrics

In the System 1/2 setting (Zhang et al., 22 Jan 2026):

The agent maintains a memory $M_t = \{ (o_i, a_i, \hat{c}_i, \hat{e}_i) \}_{i=0...t-1}$ .
For confidence aggregators $C(\tau)$ , overall quality is $C_{\text{avg}} = \frac{1}{T} \sum \hat{c}_t$ , and process reliability is $\hat{e}_t$ 0.
Calibration metrics include Trajectory-ECE, Trajectory Brier Score, and AUROC for correct/incorrect trajectory discrimination.

In conformal prediction calibration (Zhi et al., 11 Mar 2025):

The coverage guarantee enforces $\hat{e}_t$ 1 for calibrated tool outputs.
MLLM output uncertainty is quantified as $\hat{e}_t$ 2, where $\hat{e}_t$ 3 is the minimal token set covering top- $\hat{e}_t$ 4 mass at decoding step $\hat{e}_t$ 5.

3.3 Dual-Process Policy Switching

AUQ policies select between forward/fast (System 1) and reflective/slow (System 2) passes adaptively:

$\hat{e}_t$ 6

When denotative entropy is low, action selection is purely instrumental; when high, affective deflection-minimizing or heuristics dominate (Hoey et al., 2019).

4. Algorithmic Implementations and Pseudocode

A core feature of practical AUQ frameworks is training-free deployment. All logic is embedded via prompt engineering, context manipulation, and selection wrappers.

$\hat{e}_t$ 8

$\hat{e}_t$ 9

5. Applications and Empirical Findings

AUQ has been empirically validated across diverse domains:

Closed-loop planning and open-ended research: In ALFWorld, WebShop, and DeepResearch Bench, Dual-Process AUQ achieves substantial improvements in both success rate and trajectory calibration compared to single-turn or naive ensembles, e.g., ALFWorld success rate increases from 63.6% (ReAct) to 74.3% (Dual-Process), and end-state AUROC improves from 0.913 to 0.968 (Zhang et al., 22 Jan 2026).
Multimodal reasoning: The SRICE agent achieves an average 4.6% improvement over base MLLM performance across five datasets, outperforming some finetuning-based approaches (Zhi et al., 11 Mar 2025).
Uncertainty aggregation: UProp’s explicit separation of intrinsic and extrinsic uncertainty in agentic LLMs yields AUROC gains (e.g., 0.771 on AgentBench-OS vs. 0.748 for the best baseline) and boosts selective prediction reliability in safety-critical multi-step agents (Duan et al., 20 Jun 2025).
Social affective decision-making: The BayesAct dual-process model unifies both affective-alignment and utility-maximization, providing a mathematically grounded approach to RL exploration/exploitation unification and social conformity (Hoey et al., 2019).

6. Limitations, Open Problems, and Prospective Directions

Known limitations of current AUQ frameworks include reliance on LLMs’ ability to verbalize well-calibrated confidences (which may degrade in smaller models), computational overheads of adaptive reflection and MC sampling, and heuristic aspects in mutual information estimation and contextual similarity measures (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025). In multimodal systems, conformal calibration guarantees are limited to the finite-sample regime and assume sensible calibration datasets (Zhi et al., 11 Mar 2025).

Several open directions have emerged:

Adaptive risk budgeting for per-step $\hat{e}_t$ 7 selection and meta-controllers to tune reflection frequency.
Learned or meta-learned similarity kernels for PMI approximations in UProp.
Extensions to continuous-action domains and tighter theoretical error bounds on information-theoretic UQ.
Integration of affective alignment principles with explicit statistical UQ in multi-modal and agentic RL systems (Hoey et al., 2019).

7. Synthesis and Theoretical Significance

Dual-Process Agentic Uncertainty Quantification establishes a general, rigorous foundation for decision-aware UQ in agents operating over long-horizon, context-propagating tasks. By tightly integrating fast, memory-based uncertainty propagation with slow, targeted reflection triggered by explicit confidence deficits, it addresses the compounding risk inherent in agentic sequences. The framework unites formal tools from information theory, calibration statistics, and Bayesian affective modeling, and demonstrates broad empirical gains in both performance and reliability across agentic, multimodal, and social-psychological AI systems (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025, Zhi et al., 11 Mar 2025, Hoey et al., 2019).

Instantiation	Fast Path: System 1	Slow Path: System 2
Generic LLM AUQ (Zhang et al., 22 Jan 2026)	UAM: verbalized confidence/explanation	UAR: reflection invoked on low-confidence
UProp (Duan et al., 20 Jun 2025)	Intrinsic uncertainty (IU)	Extrinsic MI-based uncertainty (EU)
SRICE (Zhi et al., 11 Mar 2025)	Agentic RoI selection, CoT loop	CP-based tool calibration
BayesAct (Hoey et al., 2019)	Affective alignment (connotative)	Decision-theoretic/utility maximization

Dual-process AUQ thus frames uncertainty as both a continuous control statistic and a selective reflection trigger, providing a scalable and theoretically grounded solution to reliability in modern agentic AI.

Markdown Report Issue Upgrade to Chat

References (4)

Agentic Uncertainty Quantification (2026)

UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making (2025)

Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework (2025)

"Conservatives Overfit, Liberals Underfit": The Social-Psychological Control of Affect and Uncertainty (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Process Agentic UQ (AUQ).

Dual-Process Agentic UQ Framework

1. Motivations and Foundations

2. Dual-Process AUQ Architectures

2.1 System 1 and System 2 in AUQ

2.2 Dual-Process in Information-Theoretic AUQ

3. Mathematical Formalism and Uncertainty Metrics

3.1 Propagated Uncertainty in Sequential Decision-Making

3.2 Calibration and Selection Metrics

3.3 Dual-Process Policy Switching

4. Algorithmic Implementations and Pseudocode

Example AUQ Step (per (Zhang et al., 22 Jan 2026))

UProp TDP Sampling (per (Duan et al., 20 Jun 2025))

5. Applications and Empirical Findings

6. Limitations, Open Problems, and Prospective Directions

7. Synthesis and Theoretical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Dual-Process Agentic UQ Framework

1. Motivations and Foundations

2. Dual-Process AUQ Architectures

2.1 System 1 and System 2 in AUQ

2.2 Dual-Process in Information-Theoretic AUQ

2.3 Dual-Process in Multimodal and Social Agents

3. Mathematical Formalism and Uncertainty Metrics

3.1 Propagated Uncertainty in Sequential Decision-Making

3.2 Calibration and Selection Metrics

3.3 Dual-Process Policy Switching

4. Algorithmic Implementations and Pseudocode

Example AUQ Step (per (Zhang et al., 22 Jan 2026))

UProp TDP Sampling (per (Duan et al., 20 Jun 2025))

5. Applications and Empirical Findings

6. Limitations, Open Problems, and Prospective Directions

7. Synthesis and Theoretical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics