Dual-Process Agentic UQ Framework
- Dual-Process AUQ is a framework that actively quantifies and controls uncertainty in intelligent agents by integrating fast heuristic processes with slow reflective deliberation.
- It decomposes uncertainty into intrinsic and extrinsic components, enabling reliable long-horizon decision-making and mitigating the spiral of error propagation.
- Practical implementations span LLM agents and multimodal systems, achieving significant performance gains and robust calibration in diverse application domains.
Dual-Process Agentic Uncertainty Quantification (AUQ) is a rigorous framework for transforming uncertainty assessment in intelligent agents from passive monitoring to active, bi-directional control. By decomposing agentic reasoning into coupled inference and control processes, AUQ enables more reliable long-horizon decision-making, calibrates both intrinsic and inherited uncertainty, and situates uncertainty as a first-class driver of agentic policy and exploration. Instantiations span LLM agents in sequential and multimodal reasoning, social-psychological dual-process models, and conformalized tool calibration in vision–language systems, providing a unifying perspective on robust agent design.
1. Motivations and Foundations
Traditional uncertainty quantification (UQ) in AI agents, especially those powered by LLMs, primarily focuses on single-turn predictions, using metrics such as predictive entropy or token-level variance to detect low-confidence outputs. However, in sequential, agentic contexts—where decisions propagate over multiple steps—these approaches are fundamentally insufficient. AUQ frameworks address the compounding and propagation of epistemic risk, often termed the “Spiral of Hallucination,” wherein small early errors propagate irreversibly through the trajectory, substantially degrading reliability (Zhang et al., 22 Jan 2026, &&&1&&&).
The dual-process perspective, rooted in both cognitive psychology and computational reinforcement learning, dichotomizes agent cognition into two tightly coupled subsystems: a fast, heuristic stream (often memory- or affect-driven) and a slow, analytic control mechanism. This separation underpins a spectrum of modern agentic UQ algorithms for multimodal, language, and social reasoning tasks (Zhang et al., 22 Jan 2026, Zhi et al., 11 Mar 2025, Hoey et al., 2019).
2. Dual-Process AUQ Architectures
2.1 System 1 and System 2 in AUQ
The System 1 (“fast path”) provides uncertainty-aware implicit control, typically by propagating verbalized confidence or heuristic cues through memory and attention. In agentic LLM contexts, this is instantiated as Uncertainty-Aware Memory (UAM): each output is augmented with a scalar confidence and a natural-language explanation ; both are persistently retained in the agent’s memory and context window, biasing subsequent inference away from overcommitment (Zhang et al., 22 Jan 2026).
System 2 (“slow path”) is invoked when confidence falls below a threshold, triggering Uncertainty-Aware Reflection (UAR): targeted, high-cost deliberation (e.g., best-of-N sampling or reflective planning) guided by prior uncertainty explanations . System 2 selects the final decision via a consistency-weighted aggregation across reflective candidates, only incurring computational cost when confidence deficits arise (Zhang et al., 22 Jan 2026).
2.2 Dual-Process in Information-Theoretic AUQ
In sequence modeling, notably with LLMs, dual-process AUQ divides total predictive uncertainty at each step into an intrinsic component (local entropy given past actions) and an extrinsic component (mutual information with previous decisions). This decomposition allows agents to explicitly track how much risk is “inherited” along the trajectory and pre-allocate attention/resources (Duan et al., 20 Jun 2025).
2.3 Dual-Process in Multimodal and Social Agents
In vision–LLMs, agentic decision loops (dynamic region-of-interest selection) are paired with conformal prediction (CP)-calibrated tool outputs. The agent only attends to regions deemed relevant by stepwise reasoning (fast path), while CP-based calibration (slow path) ensures strict coverage guarantees regardless of tool miscalibration (Zhi et al., 11 Mar 2025). In social-psychological agents (BayesAct), the streams correspond to affective (connotative) and decision-theoretic (denotative) processes, with somatic coherence ensuring coherence and adaptive policy weighting (Hoey et al., 2019).
3. Mathematical Formalism and Uncertainty Metrics
3.1 Propagated Uncertainty in Sequential Decision-Making
Given a trajectory :
- Intrinsic Uncertainty (IU):
- Extrinsic Uncertainty (EU):
The UProp estimator trades direct, intractable marginalization for trajectory-wise Pointwise Mutual Information (PMI) approximations via Monte Carlo sampling. For a sampled trajectory, IU is estimated with predictive entropy, and EU is approximated using kernel-smoothed PMI scores over per-step samples (Duan et al., 20 Jun 2025).
3.2 Calibration and Selection Metrics
In the System 1/2 setting (Zhang et al., 22 Jan 2026):
- The agent maintains a memory .
- For confidence aggregators , overall quality is , and process reliability is .
- Calibration metrics include Trajectory-ECE, Trajectory Brier Score, and AUROC for correct/incorrect trajectory discrimination.
In conformal prediction calibration (Zhi et al., 11 Mar 2025):
- The coverage guarantee enforces for calibrated tool outputs.
- MLLM output uncertainty is quantified as , where is the minimal token set covering top- mass at decoding step .
3.3 Dual-Process Policy Switching
AUQ policies select between forward/fast (System 1) and reflective/slow (System 2) passes adaptively:
When denotative entropy is low, action selection is purely instrumental; when high, affective deflection-minimizing or heuristics dominate (Hoey et al., 2019).
4. Algorithmic Implementations and Pseudocode
A core feature of practical AUQ frameworks is training-free deployment. All logic is embedded via prompt engineering, context manipulation, and selection wrappers.
Example AUQ Step (per (Zhang et al., 22 Jan 2026))
1 2 3 4 5 6 7 8 9 |
Prompt model with M_t and o_t -> receive (â_t, ĥc_t, ĥe_t) if ĥc_t >= τ: execute a_t = â_t else: construct reflection prompt with ĥe_t sample N candidates select via consistency-weighted score append (o_t, a_t, ĥc_t, ĥe_t) to M_t step environment |
UProp TDP Sampling (per (Duan et al., 20 Jun 2025))
1 2 3 4 5 6 7 |
for z in range(Z): # Number of trajectories for t in range(1, T_z+1): # MC sample N continuations at step t y_t_samples = [sample_from_p(y_t | y_{1:t-1}, x) for _ in range(N)] # Compute IU_t as average negative log-probability IU_t = -np.mean([log_p(y) for y in y_t_samples]) # Estimate PMI over kernel between y_{t-1} for all previous steps |
5. Applications and Empirical Findings
AUQ has been empirically validated across diverse domains:
- Closed-loop planning and open-ended research: In ALFWorld, WebShop, and DeepResearch Bench, Dual-Process AUQ achieves substantial improvements in both success rate and trajectory calibration compared to single-turn or naive ensembles, e.g., ALFWorld success rate increases from 63.6% (ReAct) to 74.3% (Dual-Process), and end-state AUROC improves from 0.913 to 0.968 (Zhang et al., 22 Jan 2026).
- Multimodal reasoning: The SRICE agent achieves an average 4.6% improvement over base MLLM performance across five datasets, outperforming some finetuning-based approaches (Zhi et al., 11 Mar 2025).
- Uncertainty aggregation: UProp’s explicit separation of intrinsic and extrinsic uncertainty in agentic LLMs yields AUROC gains (e.g., 0.771 on AgentBench-OS vs. 0.748 for the best baseline) and boosts selective prediction reliability in safety-critical multi-step agents (Duan et al., 20 Jun 2025).
- Social affective decision-making: The BayesAct dual-process model unifies both affective-alignment and utility-maximization, providing a mathematically grounded approach to RL exploration/exploitation unification and social conformity (Hoey et al., 2019).
6. Limitations, Open Problems, and Prospective Directions
Known limitations of current AUQ frameworks include reliance on LLMs’ ability to verbalize well-calibrated confidences (which may degrade in smaller models), computational overheads of adaptive reflection and MC sampling, and heuristic aspects in mutual information estimation and contextual similarity measures (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025). In multimodal systems, conformal calibration guarantees are limited to the finite-sample regime and assume sensible calibration datasets (Zhi et al., 11 Mar 2025).
Several open directions have emerged:
- Adaptive risk budgeting for per-step selection and meta-controllers to tune reflection frequency.
- Learned or meta-learned similarity kernels for PMI approximations in UProp.
- Extensions to continuous-action domains and tighter theoretical error bounds on information-theoretic UQ.
- Integration of affective alignment principles with explicit statistical UQ in multi-modal and agentic RL systems (Hoey et al., 2019).
7. Synthesis and Theoretical Significance
Dual-Process Agentic Uncertainty Quantification establishes a general, rigorous foundation for decision-aware UQ in agents operating over long-horizon, context-propagating tasks. By tightly integrating fast, memory-based uncertainty propagation with slow, targeted reflection triggered by explicit confidence deficits, it addresses the compounding risk inherent in agentic sequences. The framework unites formal tools from information theory, calibration statistics, and Bayesian affective modeling, and demonstrates broad empirical gains in both performance and reliability across agentic, multimodal, and social-psychological AI systems (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025, Zhi et al., 11 Mar 2025, Hoey et al., 2019).
| Instantiation | Fast Path: System 1 | Slow Path: System 2 |
|---|---|---|
| Generic LLM AUQ (Zhang et al., 22 Jan 2026) | UAM: verbalized confidence/explanation | UAR: reflection invoked on low-confidence |
| UProp (Duan et al., 20 Jun 2025) | Intrinsic uncertainty (IU) | Extrinsic MI-based uncertainty (EU) |
| SRICE (Zhi et al., 11 Mar 2025) | Agentic RoI selection, CoT loop | CP-based tool calibration |
| BayesAct (Hoey et al., 2019) | Affective alignment (connotative) | Decision-theoretic/utility maximization |
Dual-process AUQ thus frames uncertainty as both a continuous control statistic and a selective reflection trigger, providing a scalable and theoretically grounded solution to reliability in modern agentic AI.